home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Libris Britannia 4
/
science library(b).zip
/
science library(b)
/
PROGRAMM
/
CC_C
/
0562.ZIP
/
NARC.DOC
< prev
next >
Wrap
Text File
|
1987-06-15
|
41KB
|
1,194 lines
Documentation for NARC.EXE
Written by Gary Conway
Infinity Design Concepts
Louisville, Kentucky
Copyright (c) 1987
Version 1.1
NARC.EXE is placed in the public domain under the user supported
shareware concept. NARC.EXE is and will remain the property of
Gary Conway. This program may not be used in any connection with
commercial ventures, nor as a sales aid, without the expressed written
consent of the author. If you find yourself using this program often
please make a contribution of $20.00 to the below address.
Infinity Design Concepts
1052 Parkway Drive
Louisville, Kentucky 40217
Member IEEE
KIPCUG
PCCL
KKUG
NSPE
All new releases of NARC.EXE and all other IDC utilities
can be located -FIRST- on ;
The SoftStone RCPM FOG #24 (supporting CP/M and MSDOS)
(502)241-4109
30 MEG
300/1200 baud
24 hrs.
Louisville, Kentucky
Curt Edwards - SYSOP
Sponsored by: Kentucky Kaypro Users Group
Accounting Computer Systems
First Osborne Group
.............................................................................
........................ ............................
........................ TABLE OF CONTENTS ............................
.............................................................................
Page
WHAT IS IT ANYWAY ............................. 1
COMPATIBILITY AND AUTHOR'S RAMBLINGS .......... 2
STOWAGE VERSIONS SUPPORTED .................... 2
OVERVIEW ...................................... 3
COMMANDS ...................................... 4-6
SHORTCUTS ..................................... 7
ERROR MESSAGES ................................ 8
ARCHIVE FILE FORMATS AND GENERAL INFORMATION .. 9
PACKING ....................................... 9
SQUEEZING (Huffman Coding) .................... 10-14
CRUNCHING (Lempel-Ziv-Welch) .................. 14-15
STOWAGE VERSIONS .............................. 16
ARCHIVE HEADER FORMAT ......................... 17
HASHING AND CRC ............................... 18
ARC.EXE RELEASE DATES AND VERSIONS ............ 19
NARC.EXE REVISION HISTORY ..................... 19
........................................................................
Narc (c) 1987 Infinity Design Concepts all rights reserved
........................................................................
═══════════════════
WHAT IS IT ANYWAY ?
═══════════════════
NARC is a menu driven de-ARChive facility, written entirely in
assembler. NARC allows you to easily move from ARC file to ARC file,
with the option of viewing or printing or extracting the subfiles from
the ARChive. The program may be operated from the mouse or the
keyboard. Menus are of the musical popup variety to add a little
"TechNoFlash" to the proceedings.
Why....
Because I use a lot of ARC files and ARC.EXE and the clones
are reminiscent of the early Ward Christensen CP/M days in user
interface etiquette, I wanted something a little more flexible and
friendly to use. I would like to pause here for a second and
give a little credit to Mr. Christensen ( the Don Garlits of CP/M )
for the fine FREE utilities he has given to ALL of us over the years.
The next time you do a modem transfer, you can thank him for the
original XMODEM from which all others have transpired.
Why NARC...
It seemed like a good idea. Short for uN-ARC. The idea
was originally Bob Freed's.
Acknowledgements..
I would like to thank Bob Freed for his allowing me to
examine his Z80 code before writing NARC. Bob wrote UNARC for the
CP/M world and is ( as of this writing 4/28/87) working on NOAH
the ARCing program for CP/M. I would also like to thank System
Enhancement Associates for releasing their source code in "C".
Without both of the above, NARC would have been a much larger chore
than it was. Note also that the crunching algorithm used in ARC.EXE
was taken from COMPRESS, used in UNIX.
A special thanks to Curt Edwards, Jerry Taylor, Frank Roemer,
Gil Levitch and Paul Bowling for their "eagle-eyes" in locating errors.
The Future...
This is the first release of NARC and timing tests have
shown that NARC is significantly faster than ARC.EXE v5.20, but it is
also significantly slower than PKXARC. Since this is my first venture
into the world of ARC files, I don't consider this to be too bad,
but the area of speed will be the main focus for the next version.
EGA support will also be a topic of discussion.
Page 1
═════════════
COMPATIBILITY
═════════════
NARC is compatible with all known "skrunching" algorithms,
that is up to and including Squashing. NARC is compatible with
ARC.EXE version 5.20 and PKxARC. NARC supports Squashing which is
nothing more than a variation of the crunching algorithm. NARC also
recognizes the .ARK extension soon to be prevalent in the CP/M world
via Bob Freed's CP/M archive facility, NOAH.
Author's Ramblings
System Enhancement Associates, I am told is dropping the
ball as far as ARC.EXE goes. I think that they deserve a great round
of applause for their contribution and help with a formidable problem,
namely, storage space (or lack thereof). Phil Katz has the ball at
this point and I hope that he uses good judgement in future releases.
Just for the record, I don't plan on writing an ARCer, there are too
many hands in the pot already.
The oldest version of ARC.EXE that I can find is version 3.10,
released 5-1-85. This version supports storage methods up to
and including squeezing (no crunching). If anyone has an older version
I would be interested in seeing it. Here is a list of the versions
that I do have and would be interested in getting any other versions
floating around.
3.10 4.10 4.50 4.52 5.00 5.10 5.12 5.20
═════════════════════════════════════════
ARCHIVE STORAGE METHODS SUPPORTED BY NARC
═════════════════════════════════════════
Packing - all versions
Squeezing - Huffman Coding
Crunching - all versions (LZW encoding)
Squashing - one version (hopefully the only one)
Note: LZW stands for Lempel-Ziv-Welch
Page 2
OverView...
When NARC is first invoked, it saves the current drive/path
for use again on exit, so you always end up where you started. Some
folks like that and some don't. I DO. NARC first searches the default
path for ARC files and if any are found they are displayed in a window
on the left side of the screen. The arrow keys (or the mouse) may
be used to move the cursor bar up and down the window, there are
three ways to select the highlighted ARC file ;
(1) Hit the ENTER key
(2) Press the left mouse button
(3) Hit the F1 key.
NOTE: Function keys F1 and F2 mimick the mouse buttons as ;
F1 = left mouse button
F2 = right mouse button
F3 = both mouse buttons
NARC will determine whether a monochrome or color monitor
is being used and act accordingly. EGA is not directly supported
at this point, because I don't have one and can't test the routines.
After selecting the sub-file of interest, NARC displays
all of the ARC sub-files and their statistics on the screen. You
are also given a menu bar at the bottom of the screen. You may
use the arrow keys or the mouse to move the cursor bar to the
desired selection and then select with the F1 key or the ENTER key
or the left mouse button. As the cursor bar is moved, you are also
given a brief description of the highlighted command. The commands
will now be discussed individually.
Note: You may also select any option from the command bar by
entering the first letter of the command.
The ESCape key will exit the program from any of the
windows or the command bar.
Page 3
════════
COMMANDS
════════
═══════════════
Extract Command
═══════════════
Selecting this option will cause another prompt to be
displayed, asking whether you wish to extract the highlighted
file or tagged files. (Files are tagged with the space bar). "Point
and shoot" here again as before. The ESC key or the right mouse
button will abort the operation.
Highlighted File
When EXTRACT is selected, you will be asked for a drive/path
to extract the file to. If you simply hit ENTER or the F1 key or the
left mouse button, the file will be extracted to the default
drive/path. You may also enter any valid DOS drive/path. The ESC key
or the F2 key or the right mouse button will abort the operation.
Tagged Files
The Space bar (or F3 key) is used to TAG the current file.
When a file is tagged, an asterisk will be displayed on the line with
the current file in column 80. As a side note, the asterisk was chosen
as a reminder of the CP/M days of NSWEEP and B29. The space bar will
also unTAG a file, thus it is a "toggle". When the space bar is
pressed, an asterisk will appear as described above and the cursor bar
will move to the next file. When the "TAGGED" option is selected from
the command line, all files that have been tagged, will be extracted
to the SAME drive/path.
After the file is extracted, it's date and time are set to
those contained in the ARC file. The file is also checked for size
and CRC, if both of these do not match exactly what was contained in
the ARC file header, then an error has occured and the user is notified
The files will also remain tagged after the extraction.
════════════
View Command
════════════
This option will display the currently highlighted file
on the screen. The space bar is used to pause and the ESC key is
used to abort. No mouse support here, since it slows things down too
much.
Page 4
═════════════
Print Command
═════════════
The print option will print the currently highlighted file.
After selecting the print option, you will be asked which character
pitch you want to print in. Enter the number that you wish (or 0 for
the default pitch) and the printer will be set to that pitch.
NOTE: The printer strings that come installed with NARC are compatible
with EPSON printer strings. If you wish to install NARC with
your own strings, see NARCCFG.DOC for complete instructions.
After you have selected the printer pitch, you will be shown
three more options. Use the arrow keys to move left and right.
The space bar (or right mouse) is used to toggle the options ON or OFF.
When you have finished and are ready to print, hit ENTER
(or left mouse button). The options are;
Format - YES - This option causes NARC to format the
output with page breaks and page numbers.
NO - NARC does not format the file.
The following two options work independently of the Format option.
Strip High - YES - NARC will strip the high bit off each
character before it is sent to the
printer. Some word processors set this
high bit on some characters as a means of
text formatting. These characters will
print as garbage usually.
NO - NARC will not strip the high bit.
Strip Control - YES - NARC will strip all control characters
from the file before it is printed. This
is useful on files that have imbedded
formatting characters, and you wish to
have NARC provide the formatting.
NO - NARC will not strip the control chars.
════════════════
ARC-wind Command Note: The right mouse button will also pop this window up.
════════════════
This option will display the window containing all of the
ARC files in the current sub-directory. Move the cursor bar up and
down with the mouse or arrow keys and select with the F1 key or
ENTER key or left mouse button.
Page 5
════════════════
DRV-wind Command
════════════════
This option will pop up a window that contains all of the
logical drives that DOS reports to NARC. Select as before and the
ARC-window will be popped up so that you can then choose an ARC file
to examine.
════════════════
SUB-wind Command
════════════════
This option will pop up a window that contains all of the
sub-directories in the current directory. Point and shoot as before.
After making your selection, the window is automatically popped up
again showing the sub-directories in the new current directory. When
you get to the directory that you want, hit the F2 key or the right
mouse button and the window will be put away and the ARC-window will
be popped up so that you can then select an ARC file.
════════════
Quit Command
════════════
I hope I don't need to describe this one.
NOTE: The ESCape key will also exit NARC.
The other options...
The F4 and F5 key functions are displayed in the lower
right hand corner of the screen. The F4 description line will
only be displayed when you have selected an ARC file and the
sub-files are displayed on the screen. If you have poped up one
of the windows, the F4 (Print Sub-Files) will not be displayed.
The F4 key will print an image of the screen less the advertisement
and command lines. The F5 key will invoke the NARC - DOS command
processor. You may then enter any valid DOS command.When you are
finished, simply hit ENTER by itself and you will be returned to
NARC. You may also enter "COMMAND" which will invoke a second copy
of COMMAND.COM, if the file COMMAND.COM is in your search path. To
return to NARC, you would then type "EXIT".
Page 6
Operating Hints and Philosophy and Shortcuts
When NARC is first invoked, it pops up the window showing
all of the ARC files in the current directory. The first thing that
you must always do is to select an ARC file. The reason for this is
that when I want to look inside an ARC file, I will move to that
drive/directory and then call NARC. This saves me from having to
select where I want to look and what drive and all that mess. This
way, when the program comes up, I can go right to work. If you run
NARC and then decide to change drives or directories, you MUST first
select an ARC file from the first window.
When there are no windows popped up, the right mouse button
(or F2 key) will pop up the ARC file window, regardless of where the
command line cursor bar is. This is handy when you have a lot of ARC's
that you want to thumb through, with just the 2 mouse buttons or
F1 and F2 keys, you can look through a whole directory of ARC files
in nothing flat ! Also along these lines, when the ARC window is
on the screen, the right mouse button or the ESCape key will exit the
program.
The control system is a little tricky to at first, but I
think you will like it once you get used to it.
NARC buffers 64k of the input file at one time, thus speeding
up file operations. The output buffer is 32k and should be larger in
the next version. NARC requires about 170K of RAM to operate.
(This prime advertising space for rent)
Page 7
══════════════════════
Error Messages.
══════════════════════
Memory Allocation Error.
- NARC allocates memory when it is invoked, this says that DOS told
NARC that there was not enough memory left to run the program
ERROR: Extraction Failed due to CRC error, Hit ENTER
- After a file is extracted, the CRC contained in the ARC header is
compared to the CRC that NARC calculates, this message says that
the two were different. This is the CRC-16 polynomial.
ERROR: Extraction Failed due to FileSize error
- Same as above, except with filesize
ERROR: Disk File Inconsistency. Hit ENTER
- This will usually mean that the user has swapped disks just after
telling NARC to View,Print or Extract and NARC does not recognize
the file.
ERROR: Incompatible Crunch Format
- Says that either the stowage code for this file is not supported by
NARC -OR- there is an error in the ARC header
ERROR: Extraction Failed due to Lack of Disk Space
- Self explanatory
Squeezed File Has a Diseased Decode Tree.
- When unsqueezing a file, NARC has found a bad entry in the decode
table.
ERROR: No directory space on destination!
- Self explanatory
Bad Path Name Dude, Hit ENTER
- The destination path that the user entered for extraction is not
a valid DOS pathname, reenter.
Requires DOS version 2.0 or above.
- NARC requires DOS 2.0 or above to operate.
Invalid archive file format
- NARC could not find any ARC headers, this is probably not an ARC file
Warning: Bad archive file header, bytes skipped = xxxxx
- If an entry has a bad header, NARC will examine the next 64k bytes
looking for a good header. This is to maintain compatibility with
ARC v.5.20 which allows self-unpacking ARC files.
Unexpected end of ARChive file
- Says that NARC couldn't find the last ARC header
No matching file(s) in ARChive
- ARC file is empty
Page 8
════════════════════════════════════════════
ARCHIVE FILE FORMATS AND GENERAL INFORMATION
════════════════════════════════════════════
For Those With a Little More Curiosity...
The following are the currently supported stowage methods.
1 unpacked (obsolete)
2 unpacked
3 packed
4 squeezed (after packing)
5 crunched (obsolete)
6 crunched (after packing) (obsolete)
7 crunched (after packing, using faster hash algorithm)
8 crunched (after packing, using dynamic LZW variations)
9 Squashed c/o Phil Katz (no packing) (variation on crunching)
NOTE: LZW is Lempel-Ziv-Welch crunching algorithm
A little about the stowage methods.
Packing -
This is the simplest of the storage methods. Suppose that
you have a line of text and at the end of the line, you have 40
spaces. These 40 spaces are compressed into 3 bytes in the ARC file.
The first byte is the actual character to be expanded (in our case
a space). The second byte is a special "flag" byte that indicates
that we need to expand these bytes. The third byte is the count byte
(in our case it would be 40). So you can see that any time the ARC'er
finds repeated bytes like this, it can compress them into 3 bytes.
Page 9
══════════════════════════
HUFFMAN CODING (SQUEEZING)
══════════════════════════
It does, at first, seem that making a file smaller would
be an impossible task. I will make an attempt here to shed a little
light on this subject since that is a question that I hear pretty
frequently and it is not a two minute discussion question. It does
require some thought.
To compress a file with the Huffman algorithm, commonly called
squeezing, the first thing that must be done is to read the file
completely and count the occurences of each character. That is you
count the "A" 's and the "B" 's and so forth. There are 256 characters
in the extended ASCII character set, of which approximately 90 are
"printable", that is you can see them on the screen. The IBM set has
more "printables", but that is of no consequence, since the squeezer
deals only with the numbers and doesn't care whether or not the file is
an ASCII text file or an EXE file. Once the squeezer has counted the
occurences of each character, thus the frequency of occurence, it
scans the table for the characters that appear the least number of
times and forms an imaginary link between them, called a node.
Somewhere else in the tree, we will later develop a pointer that
points to this node. When you start putting all of these things
together, you will form a binary tree in memory. Confused enough ?
Let us try an example.
We have a file that is 100 bytes long and has 6 different
characters in it. We have counted the occurence of each of the
characters and found the following.
quantity character
5 - D
10 - A
10 - F
20 - B
25 - E
30 - C
The spelling in the file wasn't very good, but we don't care.
Now we take these numbers and will call them the frequency of each
character. We then arrange the table as below. This is an arbitrary
arrangement, but it is useful here so as to make our tree readable
on the screen. The arrangement makes no difference.
Frequency 20 10 5 10 30 25
Character B A D F C E
Page 10
We then examine the table to find the two characters with
the smallest frequency of occurence. In our case, it is obvious that
one of them is 5,but which 10 do we choose. As it turns out, it doesn't
matter which one you choose, we will arbitrarily choose the F. We draw
lines from the D and the F to form our node (the box below).
Frequency 30 10 5 10 20 25
Character C A D F B E
\ /
\ /
┌──┐
│15│ = 5 + 10
└──┘
The number in the box is the sum of the frequencies of the
D and F characters. Now we again look for the lowest two frequencies,
except this time we do not consider the D and F characters individually,
we instead consider the node. The lowest two now are the A and the node,
that is 10 and 15. We again do some artwork.
Frequency 30 10 5 10 20 25
Character C A D F B E
\ \ /
\ \ /
\ ┌──┐
\ │15│ = 5 + 10
\ └──┘
\ /
\ /
┌──┐
│25│ = 10 + 15
└──┘
We look at the table again for the next two lowest frequencies
and now find B and E . We continue in this fashion until
the entire "tree" is built, that is until it all condenses to one node.
The leaves are the actual characters at the top of the tree and the
nodes represent branch joints with the root is at the bottom.
Page 11
Frequency 30 10 5 10 20 25
Character C A D F B E
\ \ \ / \ /
\ \ \ / \ /
\ \ ┌──┐ ┌──┐
\ \ │15│ │45│
\ \ └──┘ └──┘
\ \ / /
\ \ / /
\ ┌──┐ /
\ │25│ /
\ └──┘ /
\ / /
\ / /
┌──┐ /
│55│ /
└──┘ /
\ /
\ /
┌────┐
│ROOT│
└────┘
Now that our tree is made up, we can encode the file. We start
at the root (always). To encode the first character (leaf) of the tree
(the letter C), we trace up the tree until we hit the letter C at the
top. Along our journey, if we make a left turn, we record a 0 bit,
and a 1 for a right turn. So for the C, we would go left to 55
(and record a 0) and then left again to the letter C (and record
another 0), so the Huffman code for our letter C is 00. For A we
go left to 55, right to 25 and left to A and it becomes 010. By doing
all of the letters this way, we find the following.
C = 00 ( 2 bits )
A = 010 ( 3 bits )
D = 0110 ( 4 bits )
F = 0111 ( 4 bits )
B = 10 ( 2 bits )
E = 11 ( 2 bits )
Mind that the zeroes and ones above are bits and not bytes.
Each character was represented in the original file by 8 bits (one byte)
and since we have reduced the number of bits needed to represent each
character, we therefore reduce the size of the file. The savings add up
as follows,
Page 12
character frequency original bits squeezed bits savings
C 30 30 x 8 = 240 30 x 2 = 60 180
A 10 10 x 8 = 80 10 x 3 = 30 50
D 5 5 x 8 = 40 5 x 4 = 20 20
F 10 10 x 8 = 80 10 x 4 = 40 40
B 20 20 x 8 = 160 20 x 2 = 40 120
E 25 25 x 8 = 200 25 x 2 = 50 150
══════════ ══════ ═════ ═════
Totals 100 800 240 560
│ │
original file size ──────┘ │
squeezed file size ───────────────────────┘
240 is 30% of 800, so we have compressed this file by 70%.
Golly Wally, that seems pretty good. The rub lies in the fact that in
order to reconstruct the original file, we must have access to the
decode tree and since each tree will be different for each file, we
must therefore save the tree with the file. It turns out that the tree
can have only 255 nodes in a bytewise compression technique and each
node will hold 4 bytes as pointers, a full table will be about 1k long.
The table in our example has 5 nodes plus the 6 leaf nodes (where our
characters are), totalling 11. 4 times 11 is 44 and if we add a few
bytes for storing the node count and some other statistics, our table
is about 50 bytes long. If we look at the 240 in the above table this
gives the total number of bits that it will take to encode the file,
divide 240 by 8 to get the number of bytes (30) and add this to our 50,
we get a compressed file size of 80 bytes. Since our original file was
100 bytes, we have achieved a 20% reduction in file size. Not bad.
What we have really accomplished is a translation of character sets,
with our new set requiring less space than the original ASCII set.
How far can we go ?
If we look at the maximums that we can obtain for the different
bit combinations in a optimally skewed tree, that is a tree that is not
exactly symetrical, we find that we can have only 4 - 2 bit codes,
8 - 3 bit codes, 16 - 4 bit codes, 32 - 5 bit codes, 64 - 6 bit codes,
128 - 7 bit codes, the remaining 4 will be 8 bit codes.
2 - 1 bit codes
4 - 2 bit codes
8 - 3 bit codes
16 - 4 bit codes
32 - 5 bit codes
64 - 6 bit codes
128 - 7 bit codes
--------
254
Page 13
And since we have a total of 256 different bytes to encode,
the remaining 2 characters must have 8 bit codes. If we add the number
of bits that this represents, we find a total of 1554 bits or
195 bytes. So at maximum, we have compressed the 256 bytes to 195 or
33%, thus the idealistic maximum that can be achieved with the Huffman
algorithm is 33% when using a byte level implementation.
One final note; The Huffman scheme requires the input file
to be read twice, once to count characters and frequencies and then
again to do the actual encoding. The major differences in Huffman
coding and crunching lie in the fact that crunching is a one pass
operation and does not require the table to be stored with the file.
Also Huffman coding is extremely vulnerable to errors, for example,
imagine what would happen if you skipped one bit when squeezing the
file, all of the remaining characters in the file would become
the proverbial garbage, since we are looking at the file on a bit
level.
NARC uses the method described in K. & R. ppg. 130 for
setting up the binary tree with several modifications. The simple
binary tree is acceptable for this, since the tree never grows and
therefore will never become unbalanced.
If you followed that, now go back and read the second
paragraph again, maybe it will make sense this time.
══════════════════════
CRUNCHING
══════════════════════
Crunching began with an article by J. Ziv and A. Lempel
in IEEE Trans. Information Theory, May 1977, where the method was
originally described. Terry A. Welch wrote a definitive application
article in IEEE Computer June 1984 which described in detail how
to apply the algorithm and some common problems encountered. Thus
the name LZW compression.
Crunching takes the Huffman coding method a step further
as it does not include a table with the crunched file. The crunching
algorithm also "learns" as it proceeds through the file. If it finds
repeated strings in the file, they will be encoded into a table. This
table is set up (in NARC's implementation) as a 4096 by 3 table.
Each entry is formatted as <PREF>,<SUFFIX>, where PREF is a 2 byte
pointer to another entry. SUFFIX is the last byte of the string.
Representing the PREF's as pointers rather than values speeds up
most operations in NARC. This idea came from Bob Freed.
One obvious benefit of crunched files is the fact that there
is no need to include the encoding table in the compressed file as was
the case with squeezing. Another great benefit is the fact that
crunching is a one pass operation as opposed to the two pass situation
in squeezed files.
Page 14
Crunching begins by creating an "atomic" table, that is a
table in RAM that contains 256 entries, one for each character in the
extended ASCII set. The values range sequentially from 0 to 255. The
table entries are arranged as follows.
Prefix Pointer (2 bytes) and Suffix byte (1 byte)
In this inital table setup, the Suffix bytes are the 0 through 255
mentioned above. These are the "atomic" characters, in that they
must be in the table before curnching can begin, since all files
contain some portion of this character set. We do not know which
characters will be included in any given file and which ones will be
excluded, so we must include them all in our initial table. Once this
table is set up, we can begin crunching.
The Prefix pointer will contain a value that is a pointer
to another string. When the table is initially set up, there are no
other strings, so this Prefix pointer is set to a special "Null"
string, that is it points nowhere. We must be careful when crunching
the file, to look for this special pointer and act accordingly when
we encounter it.
This Prefix and Suffix business is used to "build" long
strings. If we read the input file and found that the first character
was the letter "I", we would look this letter up in the string table
by hashing (computing an address). If we found the letter in the table
(which we certainly will on the first character), then we ouput it's
"hashed" address as a code to the output file (the crunched file).
Suppose then, that the next character input from the file was the
letter "D", the cruncher would then look at the I and the D together
to see if they exist as a string in the table. Well of course, since
this is the second character of the file, we know that it doesn't,
so the cruncher forms a new entry in the string table. This entry
has as its' Prefix pointer, a value that "points" to the letter "I"
entry in the table, that we made a minute ago. The suffix byte in this
case would be the letter "D". Now another code is output to the
crunched file, representing the letter "D". Well this is great,
we have now turned 2 bytes from the input file (16 bits) into 3 bytes
in the output file (24 bits). You are correct, crunching begins by
"not crunching", but it is a crazy world ! The real value becomes
apparent when we run into this same sequence "ID" in the input file
again. Now we will find an entry for it in the string table and we
can output a single 12 bit code that stands for "ID", thus saving
4 bits. The cruncher continues "learning" strings like this until
the file is crunched. It should be noted that the string table is
dynamnically changing as the input file is processed.
NARC supports all of the current "skrunching" algorithms.
A brief description of each follows.
Page 15
Version 1 - "STORED" File is simply stored (obsolete now, 25 byte header)
NOTE: Files stored with this version are rare.
Version 2 - "STORED" Current version of simple storage
This version was implemented with the implementation of
file compression. The main difference in version 1 and 2
is the ARC header (see header section below), version 1
has a header length 4 bytes smaller than any of the rest
of the storage methods since in version 1 there was no
need to store the original file length separately from the
stored file length since they were the same.
Version 3 - "PACKED" Repeated bytes are packed into a three byte string
(see Packing above)
Version 4 - "SQUEEZED" after packing
The file is first packed as described above and then squeezed
Version 5 - "CRUNCHED" This is the first implementation of LZW released
Uses fixed length codes and a complex hashing function.
(obsolete now) (See hashing below)
NOTE: Files compressed with this version are hard to find.
Version was released only one month when next version
came out.
Version 6 - "CRUNCHED" after packing
The file is first packed and then crunched. Uses fixed length
codes and the same hashing function as version 5.
Version 7 - "CRUNCHED" after packing
Same as version 6 except a faster hashing function is used.
NOTE: Thom Henderson (author of ARC) has this to say about
version 7. "This approach was abandoned because dynamic
Lempel-Ziv works as well, and packs smaller also. However
inadvertent release of a developmental copy forces us to
leave this in."
Page 16
Version 8 - "CRUNCHED" after packing
Uses variable length codes in the crunched file (9 to 12 bits)
and has a faster hash function (actually not hashing at all,
but for the sake of consistency, we will call it that)
This version also resets the string table when it becomes full
which benefits the compression ratio of larger files. This
resetting is commonly called an adaptive reset.
Version 9 - "SQUASHING" (variation on crunching scheme)
This version uses the same hashing function as version 8
but varies the crunching codes from 9 to 13 bits. There is
also no packing, which affords the string table the opportunity
to "learn" longer codes and thus improve the compression
ratio (benefits "real large" files).
ARC file header structure is same for both DOS and CP/M
Byte number Value(s) Meaning
─────────────────────────────────────────────────────────────────────────────
1 1Ah Header Flag
2 0-9 Compression Version
3-15 --- ASCIIZ compressed filename
16-19 --- Compressed file size in bytes
Low Word, High Word with each word
in LoHi format
20-21 bits DOS date format
15-9 Year
8-5 Month
4-0 Day (all zeroes means no date)
22-23 bits DOS time format
15-11 Hours (military)
10-5 Minutes
4-0 Seconds
24-25 --- CRC-16 in LoHi format of uncompressed
file. ------- This is important.
26-29 --- Original uncompressed file size
NOTE: Version 1 files are not compressed
so the length can be found with
bytes 16-19, also the header len
for version 1 files is 25 bytes.
Page 17
Hashing.....
Hashing is simply an arithmetic way of coming up with
an address in a table. You have a set of input numbers (codes)
that will map one-to-one with the output codes in an ideal situation.
That is, each time you input a certain number, you can rest assured
that the output will always return the same output number. This
is not quite the case in the current versions of .ARC files. The
reason is that the algorithm would require a MEG or so of RAM for
implementation. The utilization of a smaller string table in all of
the ARC programs introduces the possibility of producing the same
output number for 2 or more input numbers. This is called a hash
collision. This is handled separately in .ARC files with what is
called a "collision table", which helps to locate the correct table
entry.
There are three versions of "hashing" used in ARC files.
Hash1 - Key = upper 12 bits of lower 18 bits of unsigned square of
(prefix code + suffix byte) OR 800h
Used in stowage versions 5 and 6
Hash2 - Key = lower 12 bits of usigned product of
(prefix code + suffix byte) * 15073
Used in stowage version 7
Hash3 - Key = next available address in table.
Used in stowage versions 8 and 9
CRC calculations -
NARC does not use the traditional table lookup method for
calculating the CRC of the file. The table approach requires the table
to be in RAM and takes up more space. NARC calculates the CRC on the
fly, which requires no table and saves space. The algorithm is taken
from the definitive article by Aram Perez in IEEE Micro, June '83.
The polynomial is X^16 + X^15 + X^2 + X^1 which is not compatible with
the Xmodem CRC.
Gary Conway
Page 18
══════════════════════════════
ARC RELEASE DATES AND VERSIONS
══════════════════════════════
These are the various versions of ARC.EXE that I have and what
versions of storage they supported. PKxARC supports all of these
methods as well since they were all originally created by ARC.EXE.
Date Stowage Methods
Released Version Supported
05-01-85 3.10 Storing,Packing,Squeezing (1-4)
( version 5 in here somewhere)
06-26-85 4.10 Up to version 6 of crunching
11-18-85 4.50 Up to version 6 of crunching
12-04-85 4.52 Up to version 6 of crunching
( version 7 in here somewhere)
01-21-86 5.00 Up to version 8 of crunching
01-31-86 5.10 Up to version 8 of crunching
02-05-86 5.12 Up to version 8 of crunching
10-24-86 5.20 Up to version 8 of crunching
This list is compiled in an attempt to start some kind
of historical record as to what transpired in the ARC world. I would
be interested in hearing of addtions.
═════════════════════════
NARC.EXE REVISION HISTORY
═════════════════════════
Version 1.0 - First Release 6.10.87
Version 1.1
- Fixed bug in tagged extraction to other than default path
- Deletes partial file when extraction fails due to lack of disk space
- Fixed bug dealing with improper end of file condition in HEX view
- Stronger ARChive header checking. Handles special case where two
1Ah bytes are encountered after a bad header
- Fixed bug with PK zero length files
- Introduction of NARCCFG.EXE, replacing NARCLOAD.EXE and NARC.CFG
End of NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 19